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ABSTRACT 


The  Hearsay  II  speech  understanding  system  being 
developed  at  Carnegie-Mellon  diversity  has  an  independent 
knowledge  source  module  for  e^  h type  of  speech  knowledge, 
‘vtodules  communicate  by  reading,  writing,  and  modifying 
hypotheses  about  various  constituents  of  the  spoken  utterance  in 
a global  data  structure.  The  syntax  and  semantics  module  uses 
rules  (productions)  of  four  types:  (1)  recognition  j;^les  for 
generating  a phrase  hypothesis  when  its  needed  constituents 
have  already  been  hypothesized;  (2)  prediction  r^  fo^f^rring 
the  likely  presence  of  a word  or  phrase  from  prevTo'usTy 
recognized  portions  of  the  utterance;  (3)  respelline  rules  for 
hypothesizing  the  constituents  of  a predicted  p hr a^indTT) 
^postdiction  rules  for  supporting  an  existing  hypothesis  on  the 
basis  of  additional  confirming  evidence.  The  rules  are 
automatically  generated  from  a declarative  (ij^  non-procedural) 
description  of  the  grammar  and  semantics,  and  are  embedded  in"a 
parallel  recognition  network  for  efficient  retrieval  of  applicable 
rules.  The  current  grammar  uses  a 450-word  vocabulary  and 
accepts  simple  English  queries  for  an  information  retrieval 
system 


INTRODUCTlONi  THE  PROBLEM 

The  fundamental  problem  facing  the  syntax  and  semantics 
component  of  a speech  understanding  system  is  uncertainty.  The 
system  is  uncertain  about  a variety  of  questions,  including: 
whether  a given  word  is  really  uttered  by  the  speaker;  when  a 
recognized  word  begins  and  ends;  whether  a particular  interval 
of  the  utterance  contains  a silence,  a Nlled  pause  ("er,"  "urn," 
"uh"),  an  informationless  Interjection  (Vknow,"  "I  mean"),  or  an 
information-bearing  word  or  phrase;  whether  a recognized  word 
or  phrase  is  used  in  a particular  sense;  etc.  Any  decisions  made 
on  the  basis  of  such  uncertain  information  are  potentially 
Incorrect  and  must  therefore  be  reversible.  The  classical  method 
of  reversing  decisions  is  backtracking.  Backtracking  and  best- 
first  evaluation  of  alternative  parses  are  the  primary  strategies 
employed  by  the  Hearsay  I speech  understanding  system  (Reddy, 
el  aL,  1973a,  1973b). 

In  Hearsay  II  (Lesser,  ej[  ^ 1975)  multiple  alternatives  are 
represented  explicitly  in  a global  data  structure  ("blackboard") 
and  considered  in  parallel  rather  than  one  at  a time  as  in  Hearsay 
I.  Processing  is  driven  by  independent  data-directed  knowledge 
modules  (KSs)  which  create,  examine,  and  revise 
hypotheses,  stored  on  the  blackboard,  about  the  utterance.  One 
dimension  of  the  blackboard  is  level  of  representation:  an  interval 
of  speech  may  be  simultaneously  represented  at  the  acoustic, 
phonetic,  phonemic,  syllabic,  word,  phrasal,  and  conceptual  levels. 
The  KSs  translate  from  one  level  to  another  with  the  ultimate 
objective  of  representing  the  utterance  at  the  conceptual  level, 
understanding  it,  Hearsay  II  is  a distributed  logic  system  in 
that  control  of  processing  Is  distributed  heterarchically  among 
the  KSs  rather  than  organized  hierarchically.  Each  KS  is 
responsible  for  deciding  when  it  has  useful  information  to 
contribute  to  the  analysis  of  the  input. 

The  syntax  and  semantics  KS  in  Hearsay  II  is  called  SASS, 
and  deals  with  hypotheses  representing  words  and  phrases 
perceived  or  expected  in  the  utterance.  From  SASS’s  viewpoint, 
the  blackboard  can  be  viewed  as  a chart  of  hypothesized  words 
as  In  Figure  1,  which  represents  the  word  hypotheses  generated 
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by  lower -level  KSs  in  response  to  the  utterance  "Tell  me  about 
beef."  In  the  figure,  time  goes  from  left  to  right  and  the  vertical 
dimension  represents  hypothesis  credibility  on  a scale  from  -100 
to  100,  as  estimated  by  other  KSs.  SASS’s  problem  is  to  find  the 
most  plausible  sequence  of  temporally  adjacent  words. 
Plausibility  is  defined  by  the  credibility  of  the  Individual  word 
hypothe.ses  and  the  grammaticality  and  meaningfulness  of  the 
^ sequence.  The  concept  of  temporal  adjacency  is  generalized  to 
tolerate  fuzzy  word  boundaries,  overlap  between  successive 
words,  silences  in  the  middle  of  word  sequences,  and 
unintelligible  intervals.  Since  some  of  the  uttered  words  may  not 
have  been  hypothesized,  SASS  must  be  able  to  expand  the 
^ solution  space  by  inferring  the  likely  presence  of  a missing  word 
on  the  basis  of  existing  word  hypotheses.  Such  inferences  are 
relatively  weak  since  several  predictions  may  be  plausible  in  a 
given  context,  In  the  example  of  Figure  1,  SASS  hypothesizes 
the  missing  word  "tell"  in  the  interval  preceding  "me  about  beef." 
Since  SASS  is  uncertain  as  to  which  word  hypotheses  are 
correct,  it  also  makes  several  incorrect  word  predictions.  Figure 
2 shows  the  words  predicted  by  SASS  on  the  basis  of  the  words 
shown  in  Figure  1.  The  figures  do  not  reflect  the  fact  that  the 
various  hypotheses  are  generated  at  different  times  and  SASS 
starts  generating  predictions  prior  to  completion  of  the  word 
recognition  process. 

In  order  to  control  the  potentially  explosive  search 
through  this  combinatorial  and  expanding  solution  space,  SASS 
must  be  able  to  reflect  the  variable  reliability  of  its  Inference 
rules  and  to  relax  its  plausibility  criteria  dynamically  so  as  to 
stimulate  processing  on  unrecognized  portions  of  the  utterance, 
SASS  must  be  able  to  use  partial  information  to  guide  further 
processing  in  useful  directions.  To  avoid  duplicated  computation, 
SASS  must  store  and  use  partial  parses,  which  are  intermediate 
computations  (plausible  subsequences)  common  to  many  potential 
parses.  SASS  must  combine  these  partial  parses  into  plausible 
complete  parses,  select  the  best  complete  parse,  interpret  the 
meaning  of  the  recognized  utterance,  and  respond  appropriately. 

The  problems  faced  by  SASS  — uncertainty,  comblnatc  rial 
search,  fuzzy  pattern-matching,  strong  and  weak  inferences,  and 
the  need  to  exploit  partial  information  — arc  common  to  many 
large  knowledge-based  systems.  Efficient  solution  of  these 
problems  appears  to  require  a system  organization  In  which  the 
scheduling  of  .inferential  processes  is  sensitive  to  various 
cooperative  and  compotTive  relationships  among  the  inferred 
hypotheses.  For  example,  processing  should  be  facilitated  on  an 
hypothesis  supported  cooperatively  by  multiple  sources  of 
information.  Conversely,  processing  should  be  inhibited  on  an 
hypothesis  which  compotes  — i^  is  inconsistent  with  --  a 
strongly  credible  hypothesis.  Inhibition  in  an  environment  of 
uncertainty  must  be  implemented  non-deterninistically,  since  the 
weake:  hypothesis  may  in  fact  be  correct.  Non-deterministic 
inhibit.i^n  is  effected  in  Hearsay  II  by  a focus  of  attention 
meclrmism  which  allocates  computational  resources  so  as  to 
cons  der  the  most  promising  hypotheses  before  others  (Hayes- 
Roth  & Lesser,  1976). 

The  approach  used  in  SASS  is  relevant  to  pattern 
recognition  for  its  fuzzy  pattern-matching;  to  problem  solving  for 
its  flexible  combination  of  bottom-up,  top-down,  forward 
inferoncing,  and  problem  reduction  mechanisms;  and  to 
information  retrieval  and  the  problem  of  pattern-directed 
function  invocation  for  Ms  efficient  mechanism  for  continuously 
monitoring  a data  base  for  occurrences  of  any  of  a large  number 
of  relational  patterns  or  templates. 


OVFR\/IEW  OF  METHOD 

Given  a declarative  (Le^  non-procedural)  description  ot  the 
target  language  which  our  system  is  to  understand,  we  need  to 
convert  it  into  behavior  which  is  adequate  to  understand 
utterances  in  the  language  efficiently  and  robustly.  Our  approach 
has  been  to  automate  this  conversion  as  much  as  possible. 
Syntactic  and  semantic  knowledge  about  the  target  language  is 
expressed  in  a compact,  readable  grammar.  A compiler  converts 
the  grammar  into  precondition-response  productions.  The 
productions  are  embedded  in  a recognition  network  to  enable 
efficient  continuous  monitoring  ot  the  blackboard  tor  stimuli 
matching  production  preconditions.  In  general,  many  productions 
will  be  invocable  at  any  given  time.  Various  scheduling  policies 
serve  to  hasten  the  invocation  ot  productions  which  are 
considered  likely  to  generate  usetui  (correct,  relevant,  and 
necessary)  results  and  to  inhibit  or  defer  less  promising 
Invocations. 

LINGUISTIC  KNOWLEDGE 

The  grammar  describing  the  target  language  is  expressed 
using  parameterized  structural  representations  (PSRs),  which  are 
sets  ot  attribute-object  pairs.  We  use  a PSR  to  define  a class  of 
words  and  phrases  which  can  tultill  the  same  syntactic  or 
semantic  function  in  the  target  language.  The  current  target 
language  consists  of  sitriple  English  queries  for  a news  retrieval 
program.  For  example,  the  PSR 

(SCLASS:  iiQUERY,  8PNAME:  "PARSED  QUERY", 

<:  8GIMME+8WMAT, 

TELL+8ME+8RE+8TOPICS, 

<:  WHAT+HAPPENE0+8ANYWAY, 

< : WI-IAT+SBE+THE+8NEWS+8RE+8TOPICS, 

< : 8BE+THERE+8ANY+8PIECES  +8RE+8TOPICS, 

8ACTI0N:  PASS, 

8LEVELt  300) 

defines  the  class  "8()UERY"  ot  possible  queries  in  terms  ot  its 
alternative  syntactic  realizations.  The  attribute  denotes 
membership  in  the  class.  Each  member  of  the  class  is  a sequence 
template  whose  constituents,  separated  by  "+",  are  words  or 
phrases.  Phrasal  constituents  are  prefixed  by  "8"  and  defined  in 
turn  by  other  PSRs.  Additional  attributes  of  the  class  are  defined 
by  other  components  ot  the  PSR.  "8ACTI0N.'  PASS'  means  that 
SASS’s  response  upon  recognizing  an  instance  ot  any  ot  the  five 
templates  in  the  class  should  be  to  treat  it  as  an  instance  of 
8QUERY.  The  SLEVEl  attribute  estimates  the  relative 
completenes;  of  the  partial  parse  underlying  the  hypothesized 
phrase.  The  PSR 

(8CLASS:  8T0PICS, 

<!  8PLACE, 

<:  8F00D, 

<;  8TECHN0L0GY, 

<:  8SCIENCE, 

<:  8G0VERNMENT, 

C;  8P0LITICS, 

<1  8PE0PLE, 

C;  8T0PICS+SC0NJU;CTI0N4$T0PICS, 

8ACTI0N:  PASS,  8LEVEL:  ^0) 

detines  the  class  of  possible  topics  in  the  news  in  terms  of  its 
semantic  subclasses.  The  grammar  for  the  current  450-word 
target  language  consists  ot  113  PSRs. 

TVPES  OF  BEHAVIOR  RULES 

SASS  has  a repertoire  ot  strong  and  weak  methods, 
represented  by  ditferent  types  of  behavior  rules  used  in 
understanding. 

A recognition  rule  generates  a phrase  hypothesis  in 
response  to  sutticiently  credible  hypotheses  for  the  phrase’s 
constituents.  SASS  considers  an  hypothesized  constituent  to  be 
recognizable  if  its  credibility  rating,  determined  by  other  KSs, 
exceeds  a minimum  threshold  for  plausibility.  The  hypothesized 
constituents  may  also  have  to  satisty  some  structural  condition 
such  as  temporal  a*^'?  ncy  between  sequential  constituents  ot  a 
phrase.  A recogr’  ♦ rule  represr^nts  a strong  inference;  its 


strength  is  the  probability  that  the  recognized  constituents  can 
be  interpreted  as  an  instance  of  the  phrase.  For  example,  "beef" 
can  be  interpreted  as  a food  or  as  a complaint,  depending  on 
context.  Recognition  rules  drive  processing  upward  toward  a 
complete  parse  ot  the  utterance  trom  plausible  partial  parses. 
Recognition  behavior  can  be  thought  of  as  bottom-up  parsing. 

A prediction  rule  hypothesizes  a word  or  phrase  which  is 
likely  to  occur  in  the  context  of  a previously  recognized  portion 
of  the  utterance.  Prediction  rules  drive  processing  outward  in 
time  from  "islands  of  plausibility,"  and  are  necessary  since  not  all 
words  In  a spoken  utterance  may  be  recognized  bottom-up  by 
lower'  levcl  KSs.  f^redictive  behavior  can  be  thought  ot  as 
torv/ard  inferencing.  The  strength  ot  a predictive  inference  is 
the  conditional  probability  that  the  predicted  constituent  occurs, 
given  that  its  predictive  context  has  been  recognized.  This 
strength  Is  inversely  related  to  the  number  ot  constituents  which 
can  plausibly  occur  in  the  given  context. 

A respolling  rule  enumeratively  hypothesizes  the 
constituents  of  a predicted  phrase,  by  subdividing  an 
hypothesized  sequence  into  hypotheses  tor  its  sequential 
constituents,  or  by  splitting  an  hypothesized  class  into  alternate 
hypotheses  tor  its  various  members.  Respelling  rules  drive 
processing  downward  toward  the  word  level,  so  that  high-level 
phrasal  predictions  can  ultimately  be  tested  word-by-word  by 
lower-level  KSs.  Respelling  can  be  thought  ot  as  top-down 
behavior  or  generation  ot  subgoals  trom  goals. 

Finally,  a postdiction  rule  solicits  post  hoc  support  tor  (ie^, 
serves  to  increase  the  credibility  ratings  of)  existing  hypotheses 
trom  other  hypotheses  in  whose  context  they  are  plausible. 
Postdiction  rules  include  prediction  and  respelling  rules  which  are 
too  weak  to  justify  creation  ot  hypotheses,  but  can  contribute 
useful  information  when  the  hypotheses  already  exist.  Fo 
example,  an  expectation  tor  an  instance  of  8T0PICS  following  the 
word  "about"  should  not  be  respelled  into  hypotheses  tor  all  the 
nouns  in  the  vocabulary,  since  to  do  so  would  explode  the  search 
space.  However,  once  the  word  "beef"  is  hypothesized  in  the 
correct  time  interval  on  the  basis  of  other  knowledge,  the 
hypothesis  should  receive  support  from  the  expectation  tor  a 
topic  word. 

Postdiction  rules  serve  three  functions:  they  allow 
cooperation  between  inferences  which  support  the  same 
hypothesis  on  the  basis  of  differ»»nt  evidence;  they  allow  words 
and  phrases  hypothesized  with  initial  low  credibility  ratings  to  be 
reco;_,nized  on  the  basis  ot  their  contextual  plausibility;  and  they 
help  focus  attention  in  productive  directions  by  increasing  the 
ratings  ot  hypotheses  which  are  contextually  plausible  (and  thus 
relatively  likely  to  be  correct)  so  that  processing  on  them  Is 
scheduled  sooner.  In  the  sense  that  postdiction  responds  to 
weakly-rated  hypotheses  by  seeking  causal  antecedents 
(predictors)  tor  them,  postdiction  can  be  thought  ot  as  post  h^ 
inferercing  or  "twenty-twenty  hindsight." 

CON\/ERSION  or  STATIC  KNOWLEDGE  TO  BEHAVIOR  RULES 

Most  ot  the  information  necessary  tor  understanding  the 
target  language  is  Implicit  in  the  grammar  which  describes  it.  The 
automatic  convcjrsion  ot  this  static  information  into  a usable 
procedural  form  is  effected  a simple  compiler  called  CVSNET, 
which  translates  the  PSRr  into  recognition,  prediclion,  respelling, 
and  postdiction  rules  A few  rules  hand-coded  in  explicitly 
procedural  form  are  added,  tor  example  a r>ie  tha»  print'*  a 
message  when  a $ent^>nce  is  recognized.  The  only  lih^jtstic 
knowledge  in  CVSNET  itself  is  an  elementary  understanding  ot 
sequences  and  classes.  CVSNET  decomposes  the  sequence 
templates  Cj +C2+...+c,^  into  pairs  of  subsequence  templates.  For 
example,  from  the  sequence  template  TELL+8ME+8RE+8TOPICS, 
CVSNET  generates  the  new  templates  8ME+8RE+8TOPICS  and 
8RE+8TOPICS. 

CVSNET  then  generates  the  appropriate  rules  tor  each 
template.  .The  recognition  rule  tor  a sequence  is  to  concatenate 
its  hypothesized  subsequences  provided  they  are  temporally 
adjacent  and  sufficiently  credible.  The  respelling  rule  respells  a 
predicted  sequence  into  its  two  subsequences.  Prediction  rules 


are  generated  to  predict  the  remaining  constituents  ot  the 
sequence  when  a subsequence  ot  it  has  been  recognized. 
Similarly,  CVSNIIT  generates  rules  tor  recognizing  an  instance  of 
a class  from  an  hypothesized  constituent  of  the  class  and  tor 
respelling  a predicted  class  into  its  constituents.  CVSNET 
estimates  the  strength  ot  each  such  rule  as  an  inverse  tunction  of 
class  size  CVSNL'T  also  generates  the  relevant  postdiction  rules. 
Some  ot  the  rules  generated  from  the  PSRs  are  shown  belowj 
rule  tyoe  is  indicated  by  the  type  of  arrow  separating  stimulus 
and  response  {%”  tor  recognition,  "«>”  tor  prediction,  "+>"  tor 
respelling,  and  <*'  tor  postdiction)  and  rule  strength  is  shown  in 
parentheses. 

TELL  & «ME  TELL+8ME  < CONCATENATE  (100)  (100)  > 

TELL  & «ME  <=  TELL+8ME  < POSTDICTISEQ  (100)  (100)  > 
TELL+8ME  TELL  & 8ME  < RESPELUSEQ  (100)  (100)  > 

8ME  «>  TELL  < PREDICT.'LEFT  (50)  ■' 

TELL  <=  8ME  < POSTDICTILEFT  (50)  > 

TELL  *>  8ME+8RE+8TOPICS  < PREDICTIRIGHT  (100)  > 
8ME+8RE+8TOPICS  <=  TELL  < POSTDICTIRIGHT  (100)  > 

8F00D -►  8T0PICS  < PASS  (100)  > 

8T0PICS  +>  V < RESPELUCLASS  (70)  > 

8FOOD  <=  STOPICb  < POSTDICTIELEMENT  (88)  > 

The  linguistic  Knowledge  expressed  compactly  in  the 
grammar  is  represented  highly  redundantly  in  the  generated 
rules,  This  redundancy  provides  the  basis  tor  robust 

performance  in  the  errorful  domain  ot  speech:  in  regions  of  the 
utterance  where  strong  interences  (recognition  rules)  are 
inadequate  (tor  example,  because  lower-level  KSs  have  tailed  to 
hypothesize  some  ot  the  uttered  words),  weaker  interences  must 
be  applied  in  order  for  the  utterance  to  be  understood, 

IDENTIFICATION  OF  moCABLE  RULES 

^ All  ot  the  rules  described  have  the  torm 

[precondition(x|,X2 x,.,)  =>*  response(X| .x^ x^)],  signitying  that 

a specitied  response  can  be  interred  with  strength  f trom  the 
objects  X|,  x^,  ...»  x^  whenever  these  objects  are  in  the 
relationships  described  by  the  associated  precondition,  The  large 
number  of  rules  required  even  in  a relatively  simple  system  (over 
3000  rules  for  a A50~word  vocabulary)  necessitates  an  efficient 
means  ot  continuously  monitoring  the  blackboard  to  determine 
which  rules  are  currently  invocable  because  of  data  satisfying 
their  preconditions. 

This  problem  is  solved  by  embedding  the  rules  in  an 
^tomaticallv  (qmpil.3ble  recognition  network  (ACORN),  as 
discussed  elsewhere  (Hayes-Roth  & Mostow,  1975),  In  brief, 
each  grammatical  constituent  (word  or  phrase)  Is  assigned  a 
unique  node  In  the  network.  Rules  whose  preconditions  reter  to 
the  constituent  are  stored  at  the  node.  Whenever  an  hypothesis 
tor  the  constituent  is  created  or  revised,  Its  node  is  activated  and 
the  relevant  rules  become  invocable, 

PRINCIPLES  OF  CONTROL 

The  rule  preconditions  are  detined  in  terms  ot  various 
tht  »sholds  for  plausibility,  temporal  adjacency,  etc.  These 
thresholds  can  be  given  values  specific  to  a particular  region  ot 
the  utterance  and  are  dynamically  modifiable.  Thus  rules  are 
Invoked  not  only  in  response  to  new  hypotheses  but  also  in 
response  to  local  threshold  changes.  This  mechanism  allows 
flexible  matching  ot  rule  preconditions.  Thresholds  can  be 
relaxed  in  unrecognized  regions  of  the  utterance  to  permit 
localized  application  ot  methods  whose  weakness  would  cause 


combinatorial  explosion  if  they  were  applied  uniformly  throughout 
the  utterance. 

Hypotheses  are  explicitly  linked  in  the  data  base  to 
hypotheses  which  supporl  them  inferentially,  and  the  links  are 
marked  with  the  strengths  of  the  inferences,  A rating  policy 
module  (RPOL)  rates  the  plausibility  ot  new  hypotheses  on  the 
basis  of  the  ratings  ot  the  hypotheses  which  support  them  and 
the  strenglhs  with  which  they  do  so,  RPOL  updates  these  ratings 
when  an  hypothesis  receives  new  support  or  when  the  rating  ot 
one  of  its  supporting  hypotheses  is  changed.  Hypotheses  are 
rated  separately  on  their  contextual  plausibility  and  on  the 
extent  to  which  they  are  supported  by  lower-level  hypotheses. 

The  combinatorial  search  can  be  controlled  by  moditying 
the  appropriate  threshold  values.  For  example,  the  search  can 
be  broadened  or  narrowed  by  relaxing  or  tightening  criteria  for 
recognizability,  since  the  solution  space  consists  only  of 
sequences  of  recognizable  words,  A best-tirst  search  policy  can 
be  implemented  simply  by  ordering  rule'  invocations  according  to 
the  strengths  of  the  rules  and  the-  plausibility  ratings  ot  the 
hypotheses  matching  the  rules’  preconditions.  The  search  can  be 
turlher  focussed  by  inhibiting  low-level  processing  within  a 
region  already  accounted  tor  by  a credible  high-level  hypothesis. 
Of  course  this  policy  must  be  pursued  with  caution  since  the 
high-level  hypothesis  may  be  incorrect.  Cautious  inhibition  is 
implemented  as  deterred  processing,  A similar  policy  of 
procrastination  can  be  used  to  defer  application  ot  weak 
inferences  in  a region  until  strong  methods  tail.  An  interentlal 
process  can  be  deferred  by  scheduling  it  with  low  priority  (so 
that  it  may  never  in  fact  be  executed),  or  by  scheduling  it  only 
when  the  relevant  thre.sholds  are  relaxed.  The  latter  mechanism 
permits  reconsideration  ot  previously  rejected  alternatives. 

Discourse  rules  can  also  help  to  focus  the  search.  For 
example,  an  hypothesis  that  the  current  topic  of  conversation  is 
tood  increases  the  a priori  probability  that  the  word  "beef*  will 
be  uttered.  If  we  can  predict  subject  matter  or  syntax  trom  any 
one  of  many  knowledge  elements  (e,g,,  a recognized  cue  word  in 
Ihe  same  utlerance,  semantic  analysis  of  previous  utterances, 
knowledge  of  the  particular  speaker’s  interests),  we  can  create 
such  an  hypothesis.  This  form  of  semantic  and  syntactic  priming 
is  non-restrictive  in  that  it  does  not  preclude  recognizing  an 
utterance  which  is  inconsistent  with  an  hypothesized  topic  of 
conversation  or  an  expectation  for  a particular  grammatical 
construction.  The  mechanism  is  also  graceful  in  that  it  does  not 
impose  a strict  hierarchy  of  topical  domains,  and  in  tact  tolerates 
ambiguity  and  uncertainty  in  the  expectations  generated  by 
previous  discourse. 

Inexact  malching  can  also  be  carefully  controlled  with 
thresholds.  An  interval  ot  silence  in  the  middle  ot  an  utterance 
can  be  accepted  by  relaxing  temporal  adjacency  thresholds  in  the 
region  of  the  silence  so  that  hypothesized  sequence  constituents 
temporally  separated  by  the  silence  will  be  considered 
lemporally  adjacent.  For  example,  if  the  speaker  says  "Tell  me 
about  . . . beef,"  this  mec..anism  allows  the  words  "about"  and 
'beet"  to  be  considered  temporally  adjacent.  Interjections  and 
unclear  intervals  of  speech  can  be  nondeterministically  ignored 
by  treating  them  as  siiences.  Sometimes  the  uttered  words 
cannot  be  recognized  by  lower-level  K‘;s  even  atter  SASS 
hypothesizes  them  on  the  basis  ot  surrounding  corftext.  In  such 
cases,  partially-matched  phrases  can  be  recognized  by  lowering 
credibility  thresholds  in  unintelligible  intervals  so  that  untultilled 
expectations  for  missing  constituents  are  treated  as  if  they  had 
been  tultilled.  These  mechanisms  can  even  be  used  to  tolerate 
some  va^'iation  trom  the  target  language  by  ignoring  extra 
verbiage  not  accounted  tor  in  the  grammar  and  by  filling  In 
omitted  constituents  required  by  the  grammar. 

PERFORhfANCE  EVALUATION 

The  contribution  ot  each  KS  in  Hearsay  II  is  highly 
dependent  upon  the  behavior  of  the  others.  Consequently, 
SASS’s  pertormance  is  difficult  to  evaluate.  For  instance,  SASS’s 
prediction  of  the  missing  word  "tell"  in  the  previous  example  may 
have  been  critical  to  recognition  of  the  utterance.  On  the  other 


hand,  the  word-hypothosizer  KS  might  eventually  have  lowered 
its  own  thresholds  enough  lo  have  weakly  hypothesized  the 
missing  "tell."  In  this  case,  SASS's  postdiction  of  the  hypothesized 
"tell"  from  its  surrounding  context  might  have  been  critical  in 
increasing  its  credibility  rating  sufficiently  to  permit  it  to  be 
recognized, 

Despite  the  complex  dynamics  of  the  integrated  system,  we 
do  have  an  evaluation  methodology  for  SASS  which  will  be 
pursued  in  the  next  year.  Basically,  our  strategy  is  to  generate  a 
variety  of  artificial  problems,  each  defined  by  a set  of 
hypothesized  words,  and  measure  the  elapsed  time  until  SASS 
parses  the  utterance.  In  particular,  we  should  be  able  to 
evaluate  the  relative  efficacy  of  th(  four  types  of  behavior  rules 
in  overcoming  various  kinds  of  error  in  the  artificial  input.  If  we 
can  then  estimate  the  relative  frequencies  of  different  kinds  of 
errors  generated  by  lower  level  KSs,  we  can  attempt  to  optimize 
SASS’s  behavioral  profile. 

CONCLUSION 

There  are  many  functions  to  be  performed  by  a syntax  and 
semantics  knowledge  source  within  a speech  understanding 
system,  In  addition  to  simply  parsing  a sentence,  the  knowledge 
source  must  use  a variety  of  strong  and  weak  inferencing 
methods  to  hypothesize  missing  constituents  and  adduce  support 
for  existing  hypotheses  found  in  appropriate  contexts.  A 
production  system  using  four  types  of  rules  has  been  developed 
to  implement  such  desirable  "knowledgeable"  behaviors,  which 
aie  automatically  inferred  from  a simple  declarative 
representation  of  the  language  to  be  understood.  By  making  the. 


invocation  of  a rule  be  dependent  upon  both  the  credibility  of 
the  data  matching  the  rule’s  preconditions  and  the  estimated 
strength  of  the  rule  as  a useful  inference,  the  entire  search 
process  may  be  controlled  so  as  to  pursue  dynamically  modifiable 
global  and  local  processing  objectives.  In  sum,  such  a production 
system  provides  a general  framework  for  representing 
"knowledgeable”  syntactic  and  semantic  behaviors.  Moreover,  the 
fine  computational  grain  of  the  behavior  rules  makes  possible  the 
flexible  and  precise  control  needed  to  avoid  a combinatorial 
explosion  In  the  search  for  a plausible  interpretation  of 
continuous  speech. 
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Ficuro  1.  Words  hypothesized  bottom-up  in  response  to  utterance  "Tetl  me  about  beef" 
marks  correct  hypothesis;  and  'T  denote  hypothasized  beginning  and  end  of  utterence 
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Figuro  2.  Words  predicted  by  SASS  on  the  basis  of  the  hypotheies  thown  In  Figure  1 
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